Analyzing Aviation Safety Reports: From Topic Modeling to Scalable Multi-Label Classification
نویسندگان
چکیده
The Aviation Safety Reporting System (ASRS) is used to collect voluntarily submitted aviation safety reports from pilots, controllers and others. As such it is particularly useful in researching aviation safety deficiencies. In this paper we address two challenges related to the analysis of ASRS data: (1) the unsupervised extraction of meaningful and interpretable topics from ASRS reports and (2) multi-label classification of ASRS data based on a set of predefined categories. For topic modeling we investigate the practical usefulness of Latent Dirichlet Allocation (LDA) when it comes to modeling ASRS reports in terms of interpretable topics. We also utilize LDA to generate a more compact representation of ASRS reports to be used in multi-label classification. For multi-label classification we propose a novel and highly scalable multi-label classification algorithm based on multi-variate regression. Empirical results indicate that our approach is superior to several baseline and state-of-the-art approaches.
منابع مشابه
Correlated Topics in a Scalable Multidimensional Text Cube: Algorithms and Aviation Safety Case Study
As world-wide air traffic continues to grow even at a modest pace, the overall complexity of the system will increase significantly. This increased complexity can lead to a larger number of fatalities per year even if the extremely low fatality rate that we currently enjoy is maintained. One important source of information about the safety of the aviation system is in Aviation Safety Text Repor...
متن کاملUsing Structural Topic Modeling to Explore Aviation Safety Reporting System Data
The Aviation Safety Reporting System includes over a million confidential reports describing safety incidents. Natural language processing techniques allow for relatively rapid and largely automated analysis of large collections of text data. Meaningful interpretation of the results and further investigations by subject matter experts can follow. This article describes the application of struct...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملCause Identification from Aviation Safety Incident Reports via Weakly Supervised Semantic Lexicon Construction
The Aviation Safety Reporting System collects voluntarily submitted reports on aviation safety incidents to facilitate research work aiming to reduce such incidents. To effectively reduce these incidents, it is vital to accurately identify why these incidents occurred. More precisely, given a set of possible causes, or shaping factors, this task of cause identification involves identifying all ...
متن کاملScalable multi-output label prediction: From classifier chains to classifier trellises
Multi-output inference tasks, such as multi-label classification, have become increasingly important in recent years. A popular method for multi-label classification is classifier chains, in which the predictions of individual classifiers are cascaded along a chain, thus taking into account inter-label dependencies and improving the overall performance. Several varieties of classifier chain met...
متن کامل